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ABSTRACT Estimates of the fraction of nucleotide substitutions driven by positive selection vary widely 
across different species. Accounting for different estinnates of positive selection has been difficult, in part 
because selection on polymorphism within a species is known to obscure a signal of positive selection 
among species. While methods have been developed to control for the confounding effects of negative 
selection against deleterious polymorphism, the impact of balancing selection on estimates of positive 
selection has not been assessed. In Saccharomyces cerevisiae, there is no signal of positive selection within 
protein coding sequences as the ratio of nonsynonymous to synonymous polymorphism is higher than that 
of divergence. To investigate the impact of balancing selection on estimates of positive selection, we 
examined five genes with high rates of nonsynonymous polymorphism in S. cerevisiae relative to diver- 
gence from S. paradoxus. One of the genes, the high-affinity zinc transporter ZRT1 showed an elevated rate 
of synonymous polymorphism indicative of balancing selection. The high rate of synonymous polymorphism 
coincided with nonsynonymous divergence among three haplotype groups, among which we found no 
detectable differences in ZRT1 function. Our results implicate balancing selection in one of five genes 
exhibiting a large excess of nonsynonymous polymorphism in yeast. We conclude that balancing selection 
is a potentially important factor in estimating the frequency of positive selection across the yeast genome. 
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The frequency of adaptive substitutions driven by positive selection is 
central to our understanding of molecular evolution and divergence 
among species. The neutral theory assumes that most substitutions are 
effectively neutral and generates predictions that can be tested based on 
patterns of molecular evolution (Fay and Wu 2003). While many 
individual genes have been found to deviate from neutral patterns 
of evolution, the overall impact of positive selection across the genome 
remains a contentious issue (Hahn 2008; Sella et al. 2009; Nei et al. 
2010; Fay 2011). 



Copyright © 2013 Engle, Fay 
doi: 1 0.1 534/g3.11 2.005082 

Manuscript received November 29, 2012; accepted for publication February 16, 2013 
This is an open-access article distributed under the terms of the Creative 
Commons Attribution Unported License (http://creativecommons.org/licenses/ 
by/3.0/), which permits unrestricted use, distribution, and reproduction in any 
medium, provided the original work is properly cited. 

Supporting information is available online at http://www.g3journal.org/lookup/ 
suppl/doi:10.1534/g3.112.005082/-/DC1 

^Corresponding author: 4444 Forest Park Blvd, St. Louis, MO 63108. E-mail: jfay@ 
genetics.wustl.edu 



Genome-wide comparisons of polymorphism vs. divergence have 
been the primary means of estimating the frequency of positive selec- 
tion among species. The McDonald-Kreitman (MK) test (Mcdonald 
and Kreitman 1991) has been used to estimate the frequency of pos- 
itive selection within protein coding sequences based on an elevated 
ratio of nonsynonymous to synonymous divergence relative to that of 
polymorphism. However, applications of the MK test to plant, animal, 
and microbial genomes have revealed substantial differences in esti- 
mates of positive selection among species, ranging from zero to over 
half of all amino acid substitutions (Fay 201 1). While the frequency of 
positive selection may differ due to a species' effective population size 
and species- specific selective pressures (Bachtrog 2008; Gossmann 
et al. 2010; Siol et al. 2010; Slotte et al. 2010; Gossmann et al. 
2012), estimating the frequency of positive selection during diver- 
gence among species depends on controlling for the effects of selec- 
tion on polymorphism within species (Fay and Wu 2001; Bierne and 
Eyre- Walker 2004; Hughes et al. 2008). 

Estimates of the frequency of positive selection can be influenced 
by a number of factors that can make it difficult to detect adaptation 
when it is present. Slightiy deleterious polymorphisms segregate at low 
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frequencies due to weak negative selection and can increase the 
nonsynonymous-to-synonymous polymorphism ratio to a greater extent 
than that of divergence. As a consequence, deleterious polymorphism 
can obscure evidence of positive selection (Fay et al. 2002; Bierne and 
Eyre- Walker 2004; Charlesworth and Eyre- Walker 2008; Eyre- Walker 
and Keightley 2009). Methods have been developed to account for the 
effects of low-frequency deleterious polymorphism, but, even so, there 
are still some species with little or no evidence of positive selection 
(Fay 2011). 

A number of other factors can influence the detection of positive 
selection through their effects on slightly deleterious polymorphism. 
Such factors include mating system as well as population size and 
structure. For example, a decrease in population size can increase the 
abundance of slightly deleterious polymorphism within a species and 
obscure evidence of positive selection among species (Eyre-Walker 
2002). In humans, there is little evidence for an excess of nonsynon- 
ymous divergence, yet it has been estimated that up to 40% of amino 
acid substitutions could have been driven by positive selection without 
being detected (Eyre- Walker and Keightley 2009). Controlling for 
these additional factors is often difficult as it requires specific knowl- 
edge of the species being examined and its population history. 

Another factor that has received less attention but can also influence 
estimates of positive selection is balancing selection (Wright and Andolfatto 
2008). The maintenance of multiple nonsynonymous polymorphisms 
within a species by balancing selection could increase the genome- 
wide ratio of nonsynonymous to synonymous polymorphism within 
a species above the ratio of nonsynonymous-to-synonymous divergence 
among species. Elevated rates of nonsynonymous polymorphism may 
also occur because of local adaptation (Charlesworth et al 1997). If an 
appreciable number of genes are involved in adaptive divergence be- 
tween different populations of the same species, genome- wide estimates 
of the frequency of positive selection among species could be substan- 
tially underestimated. 

The yeast Saccharomyces cerevisiae is one species with littie or no 
evidence of positive selection based on the MK test (Doniger et al 
2008; Liti et al 2009; Elyashiv et al 2010). In contrast to other species 
that lack evidence of positive selection (Foxe et al 2008; Gossmann 
et al 2010; Gossmann et al 2012), its large effective population size 
ensures the efficient removal of weakly deleterious mutations and the 
ability to fix weakly advantageous mutations. However, S. cerevisiae 
also exhibits strong population structure, potentially facilitated by its 
low rate of outcrossing, /.e., mating between unrelated parents (Ruderfer 
et al 2006), low rate of migration, or local adaptation to the diverse 
array of environments from which it has been isolated (Fay and 
Benavides 2005). Genome-wide patterns of population structure 
have revealed a number of genetically differentiated groups, includ- 
ing strains originating from sake in Japan, vineyards in Europe, and 
oak trees in North America (Liti et al 2009; Schacherer et al 2009). 
While these groups may have arisen as a result of geographic bar- 
riers, they might also have arisen as a consequence of domestication 
or adaptation to human-modified environments (Fay and Benavides 
2005). However, even when these groups are taken into consider- 
ation and examined separately, the ratios of nonsynonymous to 
synonymous polymorphism within or between groups are higher 
than the ratio of nonsynonymous to synonymous divergence among 
species (Elyashiv et al 2010). 

In this study, we tested the hypothesis that genes with a large 
excess of nonsynonymous polymorphism are underbalancing selec- 
tion. We reasoned that such genes have a disproportionate effect on 
estimates of positive selection and should be considered separately if 
underbalancing selection. We examined five genes that were previously 



shown to contain a large excess of nonsynonymous to synonymous 
polymorphism (Doniger et al 2008; Liti et al 2009). To distinguish 
between purifying selection and balancing selection on nonsynon- 
ymous polymorphism, we examined rates of synonymous polymor- 
phism as negative selection is expected to decrease linked neutral 
variation, whereas balancing selection is expected to increase linked 
neutral variation (Charlesworth et al 1997). We found one of the 
genes, ZRTl, showed a significantly elevated rate of synonymous 
polymorphism based on the Hudson-Kreitman-Aguade (HKA) test 
(Hudson et al 1987), consistent with balancing selection. Our results 
show that a large number of amino acid polymorphisms can occur at 
certain loci, underbalancing selection. 

MATERIALS AND METHODS 
Polymorphism and divergence data 

Data were collected for five genes that were previously found to 
exhibit an excess of nonsynonymous polymorphism in two studies 
(Doniger et al 2008; Liti et al 2009) and 30 randomly selected control 
genes, using 36 S. cerevisiae strains with genome sequence data 
(Supporting Information, Table SI). Twenty-seven of the genome 
sequences were accessed through a BLAST server (www.moseslab.csb. 
utoronto.ca/sgrp/), and the other 9 were accessed through the Sac- 
charomyces Genome Database (www.yeastgenome.org). For each gene, 
sequences homologous to the coding region of the reference genome 
(S288C) were aligned using Clustal X version 2.0 software (Larkin et al 
2007). Strains with sequences that were <99% and <90% of the 
S288C sequence length for the neutral and selected genes, respec- 
tively, were removed. The cutoff of <90% was used to accommodate 
1RA2, for which few strains had BLAST hits covering the entire 9.2 kb 
of coding sequence present in the reference genome. Strains were 
removed from a gene analysis if a polymorphism led to an internal stop 
codon (4 cases), while unique single-base insertions were considered 
sequencing errors and the base was removed from the sequence 
(10 cases). A small number of heterozygous sites were present within 
the strains Vinl3, VL3, and LalvinAQ23. At these sites, we randomly 
selected one of the two observed nucleotides to represent the position. 
Divergence was measured by comparison to the CBS432 strain of 
Saccharomyces paradoxus. 

The final dataset included an average of 29.9 strain alleles per gene, 
ranging from 23 to 36, and the five-gene set included an average of 
21.4 strain alleles per gene, ranging from 18 to 24. Eight of the control 
genes were removed from analysis because they had few strains with 
sufficient sequence coverage or multiple strains with frameshifts. One 
gene, RPS28B, was removed because it showed evidence of introgres- 
sion between species (Doniger et al 2008). The final dataset is avail- 
able in File SI. 

Population genetic analysis 

MK tests were conducted using the number of nonsynonymous and 
synonymous polymorphic sites and fixed differences calculated using 
DnaSP version 5.10.01 software (Librado and Rozas 2009). The 
weighted neutrality index (Stoletzki and Eyre- Walker 2011) was esti- 
mated by the equation: 



where P and D are the number of polymorphic sites and fixed 
differences, respectively; subscripts s and n indicate synonymous 
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and nonsynonymous changes, respectively; and / indicates the ith 
gene. 

HKA tests were conducted using maximum likelihood HKA test 
(MLHKA version 2.0) software (Wright and Charlesworth 2004) with 
rates of synonymous polymorphism and divergence obtained from 
DnaSP (Table S2). Each of the five genes found to be significant by 
the MK test were compared to 21 control genes, covering 22,974 sites, 
using the MLHKA test. The program was run with a chain length of 
100,000 interactions for all analyses. 

For analysis of the region surrounding ZRTU from MNT2 through 
FZFl (~11 kb), we downloaded S. cerevisiae and S. paradoxus strain 
sequences from the Saccharomyces Genome Resequencing Project 
(Liti et al 2009). A sliding window analysis of polymorphism and 
divergence was calculated with DnaSP using 36 S. cerevisiae strains 
and 1 S. paradoxus strain, CBS432, with gaps in the alignment 
excluded. Because of difficulty in aligning the 4.67-kb noncoding 
region between ADH4 and ZRTl, we used only ~200 bases down- 
stream of ADH4 and 800 bases upstream of ZRTl, where we were 
confident of alignment. 

Bootstrapped neighbor-joining trees for ZRTl and the concat- 
enated control gene set were constructed using MEGA5 and pair- 
wise gap removal (Tamura et al 2011). 

Strain construction and phenotype analysis 

ZRTl was deleted in YJF186 (YPS163 background. Mat a, HO:: 
dsdAMX4y ura3-140) by using the kanMX deletion cassette (Wach 
et al 1994). Three ZRTl alleles were integrated into this strain at the 
ura3 locus by amplifying the entire ZRTl gene region, including 878 
bases of the 5' noncoding region and the entire 195 bases of the 3' 
noncoding region, as well as 186 bases of the 3' gene FZFl, using 
primers with homology to pRS306, and transforming the product 
along with the yeast integrative plasmid pRS306 (Sikorski and Hieter 
1989). Integration of these constructs at the ura3 locus was achieved 
by selection on plates lacking uracil, and each transformant was 
confirmed by PGR. The ZRTl alleles were found to have between 
1 and 3 mutations. However, most alleles had only single synonymous 
changes or changes within the 5' or 3' regions, and no mutations were 
shared among the alleles, including the replicated transformants. 
These mutations were considered not fianctional because of the lack of 
any phenotypic effects. Wild- type (YJF186) and ZRTl deletion strains 
were integrated with the empty plasmid pRS306 as a control. 

Experiments comparing growth under low zinc conditions were 
conducted using low zinc media (LZM) composed of 0.17% yeast 
nitrogen base without amino acids, (NH)2SO, or zinc (MP Biomed- 
icals); 0.5% (NH4)2S04; 20 mM trisodium citrate, pH 4.2; 2% glucose; 
1 mM NasEDTA; 25 |jlM MnCh; and 10 |jlM FeGls, as previously 
described (Gitan et al 1998; Gitan et al 2003). Strains were grown 
overnight in LZM, washed, diluted to a starting optical density (OD) 



of 0.05 (absorbance at 600 nm) in fresh LZM with 0.1 mM of ZnCl2, 
and then grown for 20 hr in a plate reader at 30°C with shaking at 
1200 rpm (iEMS model 1400; Thermo Lab Systems, Helsinki, Finland). 
For each strain, the maximum OD was determined afi;er normalization 
to the initial cell concentration. For each ZRTl construct and controls, 
3-9 independent transformants were phenotyped. 

Rates of fermentation were measured using grape juice. Strains 
were grown overnight in Reserve Chardonnay grape juice (Winexpert, 
Port Coquitiam, BC, Canada), washed, and diluted to a starting OD of 
0.1 in fresh grape juice or grape juice with metal chelators (20 mM 
trisodium citrate, pH 4.2, and 1 mM Na2EDTA). Fermentation was 
conducted in 2 50 -ml flasks sealed with airlocks and incubated at room 
temperature, out of direct sunlight without shaking. Flasks were 
weighed daily to determine CO2 loss and shaken once daily, imme- 
diately following measurement. Four independent transformants were 
examined for each construct. 

RESULTS 

Identification of genes exhibiting an excess of amino 
acid polymorphism 

From two previous independent genome-wide screens based on the 
McDonald-Kreitman (MK) test, we identified five genes that were 
significant (P < 0.001) in both studies (Doniger et al 2008; Liti et al 
2009). The five genes were IRA2, 0PT2, PEPl, SASIO, and ZRTL All 
of the genes showed a ratio of nonsynonymous to synonymous poly- 
morphism (PJPs) that was more than twofold greater than that of 
divergence {DJD^. We repeated the MK test using a strain set com- 
posed of 36 strains for which genome sequences are publicly available 
(see Materials and Methods and Table SI), of which 29 were not 
included in previous screens. AU five genes retained significance 
according to the MK test results (P < 0.05, Bonferroni corrected) 
(Table 1). As a control, we randomly selected 21 genes from those 
not significant in either previous study. Only two of the genes were 
significant according to the MK test (P < 0.05, Bonferroni corrected) 
(Table S2), both were characterized as having a PJPs ratio greater 
than that of DJD^. 

The two gene sets exhibited marked differences in their neutrality 
indexes (Table 1). The weighted neutrality index (Stoletzki and Eyre- 
Walker 2011) is 3.80 for the five selected genes and 1.33 for the 21 
control genes. The high neutrality index of the five genes, indicating an 
excess of nonsynonymous polymorphism, is unlikely a consequence of 
selective pressure on synonymous sites as average codon bias is similar 
between the two groups, and the five genes have nearly equal numbers 
of changes to preferred and unpreferred codons for both polymor- 
phism, 63 and 58 respectively, and divergence, 414 and 397, respec- 
tively. Thus, the five-gene set is highly enriched for genes with an 
excess of nonsynonymous polymorphism relative to that of divergence. 



■ Table 1 McDonald-Kreitman (MK) test results 



Gene(s) 


Number of Sites 




Ps' 






MK P Value 


Neutrality Index 


IRA2 


9117 


69 


108 


131 


735 


0.000000 


3.6 


OPT2 


2436 


35 


20 


18 


187 


0.000000 


18.2 


PEP1 


4293 


48 


45 


168 


320 


0.002240 


2.0 


SASW 


1680 


12 


7 


41 


120 


0.002186 


5.0 


ZRT1 


1113 


55 


45 


12 


62 


0.000000 


6.3 


5 Genes^ 


18,639 


219 


225 


370 


1424 




3.8 


21 Genes'^ 


22,815 


158 


260 


711 


1623 




1.3 



Pn and Ps are the number of nonsynonymous (Pp) and synonymous (Ps) polymorphic sites. and Dg are the number of nonsynonymous (DJ and synonymous (Dg) 
fixed differences. '^The 5 genes and 21 control genes are described in the text. 
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Figure 1 Synonymous polymorphism vs. divergence. Synonymous nu- 
cleotide diversity (iTsyn) vs. synonymous nucleotide divergence (/Csyn) is 
shown for the five selected genes (red), the 21 control genes (black), 
and the three genes neighboring ZRT1 (blue). Nucleotide diversity was 
measured by the average number of pairwise differences among 
strains of S. cerevisiae, and nucleotide divergence was measured by 
differences between S. cerevisiae and S. paradoxus. 



Balancing selection in ZRT1 

A high ratio of nonsynonymous to synonymous polymorphism can 
result from slightly deleterious nonsynonymous mutations that con- 
tribute to polymorphism but not divergence or from a recent loss of 
functional constraint. In either scenario, the rate of synonymous 
polymorphism should not be affected. Alternatively, a high rate of 
nonsynonymous polymorphism can result from balancing selection 
on multiple nonsynonymous alleles. If balancing selection is re- 
sponsible for the elevated rate of nonsynonymous polymorphism, 
rates of linked synonymous polymorphism should also be elevated 
(Charlesworth et al 1997). 

To test whether the rate of synonymous polymorphism was 
elevated in any of the five genes, we used the HKA test (Hudson et al. 
1987). Using a maximum likelihood MLHKA test, we found only 
ZRTl showed a significantly elevated rate of synonymous polymor- 
phism in comparison to that of the control gene set (Table S3). Figure 
1 shows that of all the genes we tested, ZRTl was characterized by an 



exceptionally high rate of synonymous polymorphism. One gene in 
the control gene set, TRX2, appeared to be an outlier characterized by 
both a high rate of synonymous polymorphism and a low rate of 
synonymous divergence. TRX2 is nominally significant for an excess 
synonymous polymorphism relative to the remaining neutral gene set, 
by MLHKA test results (P = 0.0132). However, removal of TRX2 from 
the neutral gene set only increased the significance of ZRTl. Thus, of 
the five genes, only ZRTl exhibits evidence of balancing selection. 

Balancing selection of ZRTl is also supported by patterns of non- 
synonymous and synonymous divergence among strains. A neighbor- 
joining tree of ZRTl shows that all of the ZRTl alleles, except for the 
EC 111 8 allele, cluster into three groups distinguished by multiple 
nonsynonymous and synonymous differences (Figure 2). With the 
exception of the wine strain group, the groups showed no close cor- 
respondence to the source from which each strain was obtained or to 
a neighbor-joining tree generated from the concatenated control gene 
set (Figure 2 and Figure SI). Of the 111 polymorphic sites used to 
generate the tree, 86 can be placed on a single branch without homo- 
plasious traits, of which 24 nonsynonymous and 17 synonymous 
changes occurred on one of the four main internal branches (Figure 2). 
In comparison, 62 synonymous but only 12 nonsynonymous differ- 
ences separate S. cerevisiae from S. paradoxus. Thus, the high ratio of 
nonsynonymous to synonymous polymorphism is not limited to 
external branches, as would be expected to occur if most nonsynon- 
ymous polymorphisms were deleterious. 

The presence of intermediate frequency alleles, many of which 
contribute to the unique grouping of ZRTl alleles, also supports bal- 
ancing selection. For synonymous sites, results of Tajima's D test were 
positive across all strains (D = 0.718, P > 0.10) but negative within 
each of the three ZRTl strain groups (M22 group D = —1.265; 
YPS163 group D = -0.59894; and S288C group D = -0.61182; aU 
P > 0.10). In comparison, the average D result of the control gene set 
was —0.294, and only four of the genes had positive D values greater 
than 0.3. These results further highlight the unique pattern of varia- 
tion present in ZRTl. 

Regional variation around ZRT1 

The elevated rate of synonymous polymorphism in ZRTl could be 
a consequence of balancing selection on ZRTl amino acid polymor- 
phism but also could be caused by selection on its promoter or on 
adjacent genes. To determine whether the signal of balancing selection 
extends into adjacent genes and gene regions, we applied the MLHKA 
test to ADH4 and FZFU the two genes adjacent to ZRTl. Only ADH4 
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Figure 2 Neighbor-joining tree of ZRT1 
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was significant in comparison to the control gene set (MLHKA test, 
P = 0.0007) (Table S3) and was characterized by both high rates of 
polymorphism and also low rates of divergence at synonymous sites 
(Figure 1). Hence, we also tested MNT2, the next gene adjacent to 
ADH4, and found no significant departure from neutrality as mea- 
sured by the MLHKA test. Additionally, none of the three adjacent 
genes that were examined showed a significant excess of amino acid 
polymorphism as measured by the MK test (Table S2). 

To more precisely track the signal of balancing selection within 
and around ZRTU we used a sliding window analysis of polymor- 
phism to divergence, including both coding and noncoding regions. 
Figure 3 shows that the highest rate of polymorphism occurred within 
the coding region of ZRTl and extended into its 5' noncoding region. 
The overall rate of polymorphism is much lower in the two adjacent 
genes ADH4 and FZFl. In ADH4, the rate of divergence is also quite 
low and likely contributes to the significance of the MLHKA test. 
Interestingly, a portion of MNT2 had a very low rate of polymor- 
phism, whereas its more distal portion had another peak of polymor- 
phism. Based on the sliding window analysis and the MLHKA test 
results, the signature of balancing selection appears to be concentrated 
at the ZRTl locus. 

We next examined the degree to which polymorphism within 
ZRTl is independent of polymorphism within adjacent genes. There 
is ample evidence for recombination within and around ZRTl. Across 
the entire region, from MNT2 through FZFl, there have been 
a minimum of 26 recombination events based on the four-gamete 
test (Hudson and Kaplan 1985). As expected in the presence of 
recombination, the genealogies of ADH4 and FZFl differed from 
that of ZRTl (Figure S2), although all three genes showed a similar 
grouping of wine strains. ADH4 was the most similar to ZRTl but 
had less divergence. As measured by the HKA test, ZRTl showed 
significantly elevated rates of polymorphism compared to FZFl (P = 
0.0087) but not ADH4 or MNT2 (P > 0.05). 

ZRT1 alleles confer no detectable 
phenotype differences 

If selection has acted on ZRTl, then different alleles of ZRTl should 
confer different phenotypes. ZRTl is a high-affinity zinc transporter 
that is activated only when zinc levels are very low and facilitates 
growth under limiting zinc conditions (Zhao and Eide 1996). We 
compared the effects of three ZRTl alleles integrated into a strain in 
which the endogenous ZRTl gene was deleted. The three alleles were 
from the S288C (laboratory), M22 (wine), and YPS163 (nature) 
strains and were selected as representatives from the three major 
groups of strains (Figure 2). While deletion of ZRTl resulted in 
a significant growth defect in zinc-limiting conditions and each of 
the three alleles rescued the growth deficiency, we found no signif- 
icant difference among the three ZRTl alleles for maximum growth 
(Figure 4) or growth rate (not shown). 

In addition to its requirements for growth, zinc is an essential 
cofactor for many enzymes, including alcohol dehydrogenase, and has 
been shown to influence rates of fermentation (De Nicola and Walker 
201 1). To test whether the ZRTl alleles affect rates of fermentation, we 
measured CO2 release during fermentation of grape juice into wine. In 
the presence of metal chelators, deletion of ZRTl had a dramatic effect 
on the rate of fermentation, but no differences were found among the 
three ZRTl alleles tested (analysis of variance [ANOVA], P > 0.05) 
(Figure 5). No differences in rates of fermentation were found among 
any of the four strains in grape juice without chelators at any of the 
time points (ANOVA, P > 0.05). 
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Figure 3 Sliding window analysis of polymorphism and divergence 
within and around ZRTL The sliding window plot includes ZRT1 and 
three neighboring genes, with their positions and orientations indi- 
cated below the graph. Polymorphism (solid line) and divergence from 
S. paradoxus (dashed line) are shown for a window size of 200 bp and 
step size of 50 bp. A break is shown between ADH4 and ZRT1 where 
-3600 intergenic bases were excluded because of uncertainty in the 
alignment with S. paradoxus. The average nucleotide diversity of syn- 
onymoussites for the control gene set is indicated by the gray hori- 
zontal line. 



The ability of each ZRTl allele to rescue the ZRTl deletion phe- 
notypes indicates that none of the 38 amino acid polymorphisms that 
distinguish these three ZRTl alleles caused a substantial loss of func- 
tion as measured by growth or fermentation rate under the conditions 
assayed. 

DISCUSSION 

Application of the MK test to a variety of species has revealed 
substantial differences in the estimated frequency of positive selection 
on protein coding sequences (Fay 2011). While differences in effective 
population size are capable of explaining some of the differences 
among species (Eyre-Walker and Keightley 2009; Gossmann et al. 
2010; Halligan et al 2010; Siol et al 2010; Slotte et al 2010; Gossmann 
et al 2012), a small effective population cannot explain the absence of 
evidence for positive selection in yeast. In this study, we examined 
whether balancing selection can explain the high rate of nonsynon- 
ymous polymorphism observed in a small set of genes exhibiting 
a disproportionately large excess of nonsynonymous polymorphism, 
as this could obscure evidence of positive selection in S. cerevisiae. We 
showed that one of the five genes tested exhibited a significantiy ele- 
vated rate of synonymous polymorphism, indicative of balancing 
selection. While patterns of polymorphism and divergence around 
ZRTl suggest that nonsynonymous polymorphism within ZRTl itself 
is the most likely target of balancing selection, we found no functional 
differences among three alleles, using two different phenotype assays. 
Our results illustrate how balancing selection might obscure a signal of 
positive selection. 

Balancing selection in ZRT^ 

Evidence for balancing selection on ZRTl is based on an elevated rate 
of synonymous polymorphism as measured by the HKA test (Figure 1 
and Table S3), a high ratio of polymorphism to divergence that is 
centered on ZRTl (Figure 3), an increased frequency of intermediate 
frequency alleles, and the coincidence of multiple synonymous and 
nonsynonymous changes that distinguish three groups of strains (Fig- 
ure 2). However, it is worth noting that balancing selection in the 
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Figure 4 Effects of strain-specific ZRT1 alleles on growth under low- 
zinc conditions. The maximum cell density under low-zinc conditions is 
shown for YPS163 with an unmodified ZRT1 allele (WildType), the 
same strain with a deletion of ZRT1 {ZRT1 deletion), and three ZRT1 
alleles integrated into the ZRT1 deletion strain (YPS163, S288C, and 
M22). The three integrated alleles represent alleles from the three 
major strain groupings based on ZRT1. Error bars show the 95% con- 
fidence interval of the mean. 
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Figure 5 Effects of strain-specific ZRT1 alleles on fermentation rate. 
Fermentation rate, measured by CO2 release (grams per hour), for 
strains grown in grape juice containing metal chelators. All strains have 
ZRT1 deleted, and three have either an S288C (orange), YPS163 (yel- 
low), or M22 (green) allele of ZRT1 inserted at the URA3 locus. Lines 
show the average of four replicates, each from an independent trans- 
formation. Standard deviations are not shown for clarity and average 
between 0.0028 and 0.0056 for the four strains. 



general sense (i.e., selective maintenance of distinct alleles) can result 
from temporal or spatial variation in selection coefficients as well as 
heterozygote advantage. Local adaptation, which can result from spa- 
tial variation in selection coefficients, also provides an explanation for 
the presence of multiple nonsynonymous differences among alleles. 
Yet, our results are not able to distinguish between these different 
forms of selection, but rather distinguish them from patterns that 
can be explained by population structure, loss of selective constraint, 
and selection on adjacent genes. 

In S. cerevisiae, there is extensive population structure related to 
both geographic origin and the ecological source from which each strain 
was isolated (Fay and Benavides 2005; Liti et al. 2009; Schacherer 
et al. 2009), which are frequently correlated with one another. Such 
groups include sake strains from Japan, oak tree strains from North 
America, and strains isolated from Europe or vineyards. The neigh- 
bor-joining tree of the 21 control genes generally recapitulates these 
previously defined groups. While the ZRTl tree bears some resem- 
blance to that of the control gene set, particularly the vineyard 
group, the three main groups of strains that differ at ZRTl are not 
obviously related by either their geographic origin or the ecological 
source from which they were isolated. More importantly, population 
structure by itself does not explain the elevated rates of polymor- 
phism at ZRTl relative to the 21 -control-gene set. In support of 
selection acting at ZRTl, we observed negative Tajima's D values 
for most of the control gene set but a positive Tajima's D value at 
ZRTl, consistent with balancing selection. 

Loss of functional constraint or weak negative selection is another 
explanation for an excess of nonsynonymous polymorphism as 
measured by the MK test. Genome-wide estimates in yeast suggest 
that much of the nonsynonymous polymorphism may be weakly 



deleterious (Elyashiv et al. 2010). In the case of ZRTl, we cannot 
exclude the possibility that some of the nonsynonymous polymor- 
phisms are neutral or slightly deleterious. However, two lines of 
evidence indicate that at least some of the nonsynonymous changes 
within ZRTl have been underbalancing selection. First, many neu- 
tral and most deleterious polymorphisms are expected to be rare and 
present only in a small number of strains. While 78% of nonsynon- 
ymous alleles are at less than 10% frequency in the 21-control-genee 
set, only 38% of nonsynonymous polymorphism are at less than 10% 
frequency in ZRTl. In addition, of the nonsynonymous changes that 
are specific to one or more lineages, 55% are positioned along the 
four internal branches that distinguish the three major groups of 
ZRTl alleles. Second, neither loss of constraint or negative selection 
on nonsynonymous polymorphism should increase variation at linked 
synonymous sites. 

While patterns of variation within and around ZRTl indicated that 
it is the most likely target of balancing selection, selection on linked 
sites could have influenced observed patterns of variation at ZRTl. 
Patterns of polymorphism within FZFl, the adjacent gene, indicated 
no excess of synonymous or nonsynonymous polymorphism. How- 
ever, FZFl may have experienced a recent selective sweep as there is 
evidence of positive selection during S. cerevisiae and S. paradoxus 
divergence, both within its coding region and within the intergenic 
region between ZRTl and FZFl (Sawyer et al. 2005; Engle and Fay 
2012). Patterns of polymorphism within the adjacent genes ADH4 and 
MNT2 are more complex. Both genes show rates of synonymous 
polymorphism that are higher than those of the 21-control-genee 
set, except for the outlier gene TRX2. MNT2 shows regions with high 
and low polymorphism levels, but the region closest to ZRTl has the 
lower rate of polymorphism (Figure 3). ADH4 shows a significanfly 
elevated rate of synonymous polymorphism relative to divergence by 



670 I E. K. Engle and J. C. Fay 



r-^.G3' Genes \ Genomes | Genetics 



the HKA test. Yet, in comparison to other regions (Figure 3) the 
significance of ADH4 appears to be a partial consequence of the 
low rate of synonymous divergence. These observations combined 
with the ample evidence for recombination within the region in- 
dicate that while sites within MNT2 and ADH4 may have also been 
under selection, selection on linked sites in adjacent regions are 
unlikely to be solely responsible for the high rate of synonymous 
polymorphism at ZRTl. 

Relevant to the possibility of selection on adjacent genes, there are 
functional links between ADH4 and ZRTL ADH4 and ZRTl are both 
activated by ZAPl in zinc limiting conditions (Lyons et al 2000), and 
ADH4 is an alcohol dehydrogenase that may help conserve zinc or 
work more efficiently under zinc limiting conditions (Bird et al 2006; 
De Nicola et al 2007), or during fermentation of sugars to ethanol 
(Zhao and Bai 2012). Interestingly, the closest homologs of ZRTl 
outside of those present within the sensu stricto Saccharomyces species 
are from two distantly related species commonly found in wine fer- 
mentations, Lachancea thermotolerans and Zygosaccharomyces rouxii 
(Combina et al 2005; Romancino et al 2008), rather than other more 
closely related species, suggesting that ZRTl may have been intro- 
gressed into the ancestral lineage of the Saccharomyces species. The 
subtelomeric physical location of ZRTl in S. cerevisiae is consistent 
with other genes acquired by horizontal gene transfer (Hall et al 2005; 
Muller and Mccusker 2011) and genes likely to be involved with 
adaptations to specific environments (Brown et al 2010). 

Phenotypic effects of ZRTl alleles 

We found that the three distinct ZRTl alleles conferred no detectable 
phenotypic differences from one another. Although the alleles were 
not integrated at the endogenous ZRTl locus, each was able to fully 
rescue the ZRTl deletion phenotype (Figure 4). This result indicates 
that under the conditions tested, none of the nonsynonymous differ- 
ences among the three alleles caused a substantial loss of ZRTl func- 
tion. The lack of phenotypic differences among the different ZRTl 
alleles implies that either the alleles are fianctionally equivalent to one 
another and so are not involved in balancing selection or that the lack 
of a discernible phenotype is a consequence of the conditions tested or 
an effect too small to be detected. For example, ZRTl, as a metal 
transporter, could also influence fitness due to transport of other 
metals, such as cadmium (Gitan et al 1998; Gomes et al 2002; Gitan 
et al 2003). 

Prevalence of balancing selection 

Of the five genes that exhibited an excess of nonsynonymous poly- 
morphism according to MK test results, only ZRTl showed evidence of 
balancing selection. The excess of nonsynonymous polymorphism in the 
other four genes is most likely a consequence of loss of fiinctional 
constraint or slightiy deleterious polymorphism. Interestingly, alleles of 
IRA2, a GTPase that negatively regulates RAS signaling, are responsible 
for numerous environment-specific differences in gene expression across 
the genome (Smith and Kruglyak 2008), and alleles of IRA2 also have 
been shown to affect high temperature growth (Parts et al 2011). How- 
ever, 1RA2 does not show an excess of synonymous polymorphism as 
measured by the HKA test. 

The prevalence of balancing selection across the entire yeast 
genome is more difficult to assess. The observation that rates of 
nonsynonymous and synonymous polymorphism are correlated 
with one another provides some evidence for the possibility of weak 
balancing selection throughout the yeast genome (Cutter and Moses 
2011). However, genes with high rates of synonymous polymor- 
phism do not show a tendency toward an excess of nonsynonymous 



polymorphism (Kendall's tau = —0.15) (Figure 6) as predicted by the 
MK test using the data of Liti et al (2009). The challenge to inter- 
preting genome-wide evidence for balancing selection is that many 
cases of balancing selection may be difficult to detect. First, the effect 
of balancing selection on linked variation decreases as a function of 
the rate of recombination; nucleotide diversity is 1 + 1/4 Nr(l-F) 
relative to a neutral locus, where N is the effective population size 
and r is the rate of recombination and F is the inbreeding coefficient 
(Charlesworth et al 1997). Using a rate of recombination of 3.5 x 
10~^/bp, a rate of outcrossing of 2 x 10 "^/generation, and an effec- 
tive population size of 1.6 x 10^ cells (Ruderfer et al 2006), we 
expect diversity to be increased by a factor of 10 and 2, 13 bp and 
113 bp from a site underbalancing selection, respectively. Gene con- 
version is expected to narrow this window even further (Andolfatto 
and Nordborg 1998). Second, balancing selection must act over many 
generations, on the order of the effective population size (Navarro 
et al 2000), to noticeably influence linked neutral variation. However, 
the ability to detect balancing selection may be increased if there are 
multiple selected sites at a single locus, which might be the case for 
genes identified by a high rate of nonsynonymous polymorphism. 
Thus, it is hard to rule out the possibility that balancing selection 
has inflated the rate of nonsynonymous polymorphism across many 
genes without generating a strong effect on linked synonymous sites. 

Why is there little evidence of adaptive evolution within 
the yeast genome? 

An important and persistent question in genome-wide estimates of 
adaptive evolution based on the MK test is why some species show 
high rates of adaptive evolution whereas others, such as yeast, do not. 
A small effective population size is one explanation as adaptive 
substitutions are expected to be more infrequent and deleterious 
polymorphism more common. This provides a reasonable explanation 
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Figure 6 No abundance of genes exhibiting high rates of synonymous 
polymorphism and an excess of nonsynonymous polymorphism. The 
rate of synonymous polymorphism, measured by the number of 
synonymous single nucleotide polymorphisms (SNPs) per codon 
compared to the observed (obs.) minus the expected (exp.) number 
of nonsynonymous SNPs (data are from Liti et al. 2009). The expected 
number of nonsynonymous SNPs was derived from Pn ~ PsX (Dp/Ds), 
where Pn and Ps are the number of nonsynonymous and synonymous 
SNPs, respectively, and Dp and Dg are the number of nonsynonymous 
and synonymous fixed differences, respectively. Genes with nonsynon- 
ymous polymorphism below —10 or above 10 are shown as points at 
the values of -10 or 10, respectively. The red point is ZRT1. 
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for the absence of signal in humans and many plant species (Eyre- 
Walker and Keightley 2009; Gossmann et al 2010; Halligan et al 
2010; Siol et al 2010; Slotte et al 2010; Gossmann et al 2012). 
However, it does not explain the lack of signal in yeast, which has 
a large effective population size, on the order of 10^ for S. paradoxus 
(Tsai et al 2008) and S. cerevisiae (Ruderfer et al 2006). The rate of 
outcrossing may also be relevant to detecting selection in yeast. Self- 
ing helps purge recessive deleterious alleles but also limits recombi- 
nation between different haplotypes. Despite the presence of selfing 
in yeast, S. cerevisiae exhibits an excess of rare nonsynonymous 
polymorphism indicative of deleterious alleles and a rapid decay 
in levels of linkage disequilibrium, an observation that can be attrib- 
uted to its exceptionally high rate of recombination even if outcross- 
ing is rare. Thus, there is no obvious aspect of S. cerevisiae diversity 
that distinguishes it from outcrossing species. Furthermore, both 
a selfing and an outcrossing species of Arabidopsis show no signal 
of adaptive evolution (Foxe et al 2008). As it stands, neither pop- 
ulation size nor selfing provide a particularly compelling explanation 
for why yeast show little adaptive evolution based on the MK test. 

In the present study, we considered the possibility that balancing 
selection obscured patterns of positive selection in yeast. While we 
focused only on a small number of genes exhibiting a large excess of 
nonsynonymous polymorphism, we found one that exhibited evi- 
dence of balancing selection. Thus, while our results highlight the 
need to consider balancing selection in estimating the frequency of 
positive selection in yeast, it remains difficult to assess the impact of 
this consideration. We conclude that balancing selection is a poten- 
tially important factor in estimating the frequency of positive selection 
in yeast. While not emphasized here, it is also important to consider 
whether adaptive evolution is rare and estimates of positive selection 
are inflated in other species (Fay 2011). 
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